Systems and Methods for the Analysis of Protein Melt Curve Data

ABSTRACT

The present teachings relate to embodiments of systems and methods for the analysis of melt curve data for a plurality of samples. According to various embodiments, a melting temperature (T m ) may be determined across a range of different types of protein melt curve data, having variability over a plurality of analytical attributes in order to accommodate the complexity of protein melt curve data. The combination of a plurality of samples, coupled with the complexity of the data gives rise for a need to process the data in a manner that readily facilitates end-user to analysis of the data. Various embodiments of an interactive graphical user interface (GUI) according to the present teachings provide for rapid and sequential changes that may be made by an end user to displayed protein melt curve data to allow such analysis.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to U.S. Provisional Patent ApplicationNo. 61/438,621, filed, filed on Feb. 1, 2011, U.S. Provisional PatentApplication No. 61/450,306, filed on Mar. 8, 2011, and U.S. ProvisionalPatent Application No. 61/496,980, filed Jun. 14, 2011, all of which areincorporated herein by reference.

BACKGROUND

As one of ordinary skill in the art of protein chemistry may beapprised, protein melting curve data may vary considerably, and maydisplay variability over a plurality of analytical attributes. Suchanalytical attributes may include, for example, but not limited by,curve shape, background signal, change in signal amplitude, and noise.

Systems and methods according to the present teachings for the analysisof a protein melt curve data, in which a melting temperature (T_(m)) maybe determined, address the need for objective and consistent analysis ofprotein melt curve data. For protein melt curve data, for example, inhigh throughput analyses, a plurality of protein samples may beprocessed, which may create a set of protein melt curve data displayinghigh variability over a range of analytical attributes.

The combination of a plurality of samples processed simultaneously,coupled with the complexity of the data gives rise for a need to processthe data in a manner that readily facilitates end-user evaluation of thedata.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram that illustrates components of an exemplarycomputer system that may be utilized in the control and interface of asystem used for processing protein samples for melt curve analysis.

FIG. 2 is a block diagram of an example of some instrument features thatmay be useful in the processing of protein samples for melt curveanalysis.

FIG. 3 is representation of an input/output diagram for variousembodiments of an interactive GUI for the analysis of melt curve data.

FIG. 4 is a flow chart that depicts various embodiments of systems andmethods for the analysis of protein melt curve data.

FIG. 5 is a flow chart that depicts various embodiments of systems andmethods for the analysis of protein melt curve data.

FIG. 6 is a flow chart that depicts various embodiments of systems andmethods for the analysis of protein melt curve data.

FIG. 7 is a graphical representation illustrating various embodiments ofmethods for peak selection for multiphase protein melting.

FIG. 8 is a graphical representation illustrating various embodiments ofmethods for peak selection for multiphase protein melting.

FIG. 9 is an exemplary window of an interactive GUI according to variousembodiments of systems and methods according to the present teachings.

FIG. 10A and FIG. 10B are exemplary portions of a embodiments of GUI ofFIG. 9, which displays the effect of a selection of a function from anexemplary popup window.

FIG. 11 is an exemplary window of an interactive GUI according tovarious embodiments of systems and methods according to the presentteachings.

FIG. 12 is an exemplary portion of embodiments of an interactive GUI ofFIG. 9, which displays the interactive nature for an end user comparingvarious embodiments of a determination of a melt temperature (T_(m)).

FIG. 13 is an exemplary portion of embodiments of an interactive GUI ofFIG. 9, which displays the interactive nature for an end user selectinga target temperature region for analysis for various embodiments adetermination of a melt temperature (T_(m)).

FIG. 14 is an exemplary portion of embodiments of an interactive GUI ofFIG. 9, which displays the facility for viewing a fit of the data forselected data for various embodiments a determination of a melttemperature (T_(m)).

FIG. 15 is an exemplary portion of embodiments of an interactive GUI ofthe present teachings, which displays the facility for viewing a fit ofthe data exhibiting multiphase melting.

FIG. 16 is an exemplary portion of embodiments of an interactive GUI ofFIG. 14, which displays the facility for viewing a fit of the dataexhibiting multiphase.

FIG. 17A and FIG. 17B display a feature of an interactive GUI accordingto various embodiments for a system providing protein melt analysis,which displays the facility for viewing a plurality of curves byaligning curves to a common y-axis.

FIG. 18 is display feature of an interactive GUI according to variousembodiments of the present teachings for conveying informationconcerning replicate data groups.

FIG. 19 depicts chart according to various embodiment of systems andmethods of an interactive GUI according to the present teachings, whichdepicts various conditions for which an end user may receive a flagnotification.

FIG. 20 depicts an exemplary portion of a feature of an interactive GUIaccording to various embodiments for a system providing protein meltanalysis, which displays the facility for viewing the effect of variousparameters on replicate data groups.

FIG. 21 depicts an exemplary portion of a feature of an interactive GUIaccording to various embodiments for a system providing protein meltanalysis, which displays the facility for viewing the effect of variousparameters on replicate data groups via a selection from an exemplarypopup window.

FIG. 22 is an exemplary portion of embodiments of an interactive GUI ofFIG. 19, which displays the facility for viewing the effect of variousparameters on replicate data groups.

FIG. 23 is an exemplary portion of embodiments of an interactive GUI ofFIG. 19, which displays the facility for viewing the effect of variousparameters on replicate data groups.

FIG. 24A and FIG. 24B are exemplary portions of embodiments of aninteractive GUI of the present teachings, which display the facility forviewing a selected set of data exhibiting biphasic melt. FIG. 24C is anexemplary portion of embodiments of an interactive GUI of the presentteachings, which displays the facility for viewing the effect of variousparameters on replicate data groups from the data sets selected in FIG.24A and FIG. 24B.

FIG. 25A depicts an exemplary portion of a feature of an interactive GUIaccording to various embodiments for a system providing protein meltanalysis, which displays the facility for viewing results from aselected positive threshold value for ΔT_(m) via a selection from anexemplary popup window. FIG. 25B is an exemplary portion of embodimentsof an interactive GUI of FIG. 25A, which displays the facility forviewing replicate data groups falling within the positive thresholdselected.

FIG. 26A depicts an exemplary portion of a feature of an interactive GUIaccording to the present teachings, which displays the facility forviewing results from a selected negative threshold value for ΔT_(m) viaa selection from an exemplary popup window. FIG. 26B is an exemplaryportion of embodiments of an interactive GUI of FIG. 25A, which displaysthe facility for viewing replicate data groups falling within thenegative threshold selected.

DETAILED DESCRIPTION

The present teachings relate to embodiments of systems and methods thatreadily facilitate end-user analysis of protein melt curve data.According to various embodiments, a melting temperature (T_(m)) may bedetermined from a protein thermal stability study across a range ofdifferent types of protein melt curve data, having variability over aplurality of analytical attributes. For various embodiments, analyticalattributes may include, for example, but not limited by, curve shape,background signal, change in signal amplitude, and noise. Additionally,a plurality of samples may be processed under a variety of experimentalconditions, thereby creating a substantial amount of data for which anend user may evaluate. For various embodiments, given the complexity andamount of data generated, systems and methods of the present teachingsprovide ready facilitation of end-user analysis and evaluation of thedata. According to various embodiments, an interactive graphical userinterface (GUI) is provided to facilitate end-user analysis andevaluation of the data. In various embodiments, an interactive GUI maybe an interactive tool providing various features that allow an end userto sequentially and rapidly analyze protein melt curve data. Accordingto various embodiments, an interactive GUI may allow an end user tosequentially and rapidly analyze and evaluate protein melt curve dataand subsets of data for the determination of a T_(m). For variousembodiments, an interactive GUI may allow an end user to sequentiallyand rapidly analyze and evaluate protein melt curve data and subsets ofdata with respect to the replicate group data, such as the impact ofvariety of experimental variables on the replicate data sets, as well asthe central tendency and variance of replicates associated with aselected set of protein melt curve data.

One of ordinary skill in the art may recognize various assays utilizingthe determination of the melting temperature (T_(m)) of a protein. Theprocess in which a protein having, for example, a tertiary structure,goes from that tertiary structure to a random coil structure is referredto in the art as, for example, but not limited by, protein denaturation,protein unfolding, and protein melt. Additionally, a protein undervarious sample solution conditions may show a variation or shift in theobserved T_(m) for that protein as a function of the sample solutionconditions. Various terms such as thermal melt assays (TMA), thermalshift assay (TSA), protein thermal shift (PTS) analysis, anddifferential scanning fluorimetry (DSF) are examples of terms of the artin which the determination of the T_(m) of a protein or proteins iscentral to the analysis.

With respect to aspects of measurement science applied to proteinchemistry, a change in detector signal amplitude may be observed as afunction of the change in the folded state of a protein. In that regard,various analyses may be based on either the increase or decrease offluorescence signal amplitude as it varies with respect to change intemperature applied to a protein sample.

For example, in various analyses, the signal amplitude may arise from anamino acid residue of the protein, such as tryptophan. As one ofordinary skill in the art is apprised, the intensity, quantum yield, andwavelength of maximum fluorescence emission of tryptophan are verysolvent dependent. The fluorescence spectrum shifts to shorterwavelength and the intensity of the fluorescence increases as thepolarity of the solvent surrounding the tryptophan residue decreases.Therefore, as a protein unfolds, buried tryptophan residues may beexposed to a more polar aqueous solvent environment, so that adecreasing signal amplitude may be observed from a folded to an unfoldedstate.

Instead of using an intrinsic signal arising from a protein molecule,other analyses may utilize a dye to indicate a folded state of aprotein. For example, a fluorescence dye, such as Sypro®Orange, may beutilized to monitor the folded state of a protein. For Sypro®Orange in apolar solvent environment, quenching of the fluorescent signal isobserved. For Sypro®Orange associated with the surface groups of afolded protein in solution, the dye is in an aqueous environment, sothat its fluorescence signal is quenched. As a protein is unfolded,using for example, thermal unfolding, hydrophobic regions or residuesmay be exposed. Sypro®Orange may then bind to hydrophobic regions orresidues, and fluorescence may thereby be increased. For such aSypro®Orange assay, then an increasing signal amplitude going from afolded to unfolded state may be observed. Dyes, such as1-anilinonaphthalene-8-sulfonic acid (1,8-ANS) and4,4′-Dianilino-1,1′-Binaphthyl-5,5′-Disulfonic Acid (Bis-ANS), which arequenched in aqueous environments, have been shown to be useful formonitoring protein folding, in which the fluorescence of 1,8 ANS andBis-ANS may increase substantially in the process of, for example,protein refolding.

As one of ordinary skill in the art of protein sciences is apprised,monitoring protein thermal stability may be done in both academe, aswell as industry for a variety reasons. For example, but not limited by,protein melt curve studies, or thermal studies, may be done forinvestigation of mutations to a target protein as a result of, forexample, site directed mutagenesis studies. Additionally, proteinthermal stability studies may be done to screen for the impact onprotein stability due to a variety in vitro processing and storageconditions. Such protein thermal stability studies may screen for theimpact that a variety of additives, such as, buffers, ligands, andorganic agents may have on the thermal stability of the protein ofinterest. High throughput screening of the binding of drug candidates toprotein targets may also be monitored by the impact that the binding ofa drug candidate may have on protein thermal stability. Accordingly,identifying the conditions that affect protein thermal stability mayenhance the identification of a variety of desired conditions impactingprotein purification, crystallization, and functional characterization.

As will be discussed in more detail subsequently, various embodiments ofsystems and methods may utilize detector signal data collected over theentirety of a defined temperature range for a protein melt assay. Suchsignals may be stored in a variety of computer readable media. Invarious embodiments according to the present teachings, a computerprogram product may be provided, which may include a tangiblecomputer-readable storage medium whose contents include a program withinstructions that when executed on a processor perform a method forproviding an end user with the ability to sequentially and rapidlyanalyze and evaluate protein melt curve data.

FIG. 1 is a block diagram that illustrates a computer system 100 thatmay be employed to carry out processing functionality, according tovarious embodiments, upon which embodiments of the present teachings maybe implemented. Computing system 100 can include one or more processors,such as a processor 104. Processor 104 can be implemented using ageneral or special purpose processing engine such as, for example, amicroprocessor, controller or other control logic. In this example,processor 104 is connected to a bus 102 or other communication medium.

Further, it should be appreciated that a computing system 100 of FIG. 1may be embodied in any of a number of forms, such as a rack-mountedcomputer, mainframe, supercomputer, server, client, a desktop computer,a laptop computer, a tablet computer, hand-held computing device (e.g.,PDA, cell phone, smart phone, palmtop, etc.), cluster grid, netbook,embedded systems, or any other type of special or general purposecomputing device as may be desirable or appropriate for a givenapplication or environment. Additionally, a computing system 100 caninclude a conventional network system including a client/serverenvironment and one or more database servers, or integration withLIS/LIMS infrastructure. A number of conventional network systems,including a local area network (LAN) or a wide area network (WAN), andincluding wireless and/or wired components, are known in the art.Additionally, client/server environments, database servers, and networksare well documented in the art.

Computing system 100 may include bus 102 or other communicationmechanism for communicating information, and processor 104 coupled withbus 102 for processing information.

Computing system 100 also includes a memory 106, which can be a randomaccess memory (RAM) or other dynamic memory, coupled to bus 102 forstoring instructions to be executed by processor 104. Memory 106 alsomay be used for storing temporary variables or other intermediateinformation during execution of instructions to be executed by processor104. Computing system 100 further includes a read only memory (ROM) 108or other static storage device coupled to bus 102 for storing staticinformation and instructions for processor 104.

Computing system 100 may also include a storage device 110, such as amagnetic disk, optical disk, or solid state drive (SSD) are provided andcoupled to bus 102 for storing information and instructions. Storagedevice 110 may include a media drive and a removable storage interface.A media drive may include a drive or other mechanism to support fixed orremovable storage media, such as a hard disk drive, a floppy disk drive,a magnetic tape drive, an optical disk drive, a CD or DVD drive (R orRW), flash drive, or other removable or fixed media drive. As theseexamples illustrate, the storage media may include a computer-readablestorage medium having stored therein particular computer software,instructions, and/or data.

In alternative embodiments, storage device 110 may include other similarinstrumentalities for allowing computer programs or other instructionsor data to be loaded into computing system 100. Such instrumentalitiesmay include, for example, a removable storage unit and an interface,such as a program cartridge and cartridge interface, a removable memory(for example, a flash memory or other removable memory module) andmemory slot, and other removable storage units and interfaces that allowsoftware and data to be transferred from the storage device 110 tocomputing system 100.

Computing system 100 can also include a communications interface 118.Communications interface 118 can be used to allow software and data tobe transferred between computing system 100 and external devices.Examples of communications interface 118 can include a modem, a networkinterface (such as an Ethernet or other NIC card), a communications port(such as for example, a USB port, a RS-232C serial port), a PCMCIA slotand card, Bluetooth, and the like. Software and data transferred viacommunications interface 118 are in the form of signals which can beelectronic, electromagnetic, optical or other signals capable of beingreceived by communications interface 118. These signals may betransmitted and received by communications interface 118 via a channelsuch as a wireless medium, wire or cable, fiber optics, or othercommunications medium. Some examples of a channel include a phone line,a cellular phone link, an RF link, a network interface, a local or widearea network, and other communications channels.

Computing system 100 may be in communication through communicationsinterface 118 to a display 112, such as a cathode ray tube (CRT), liquidcrystal display (LCD), and light-emitting diode (LED) display fordisplaying information to a computer user. In various embodiments,computing system 100 may be couple to a display through a bus. An inputdevice 114, including alphanumeric and other keys, is coupled to bus 102for communicating information and command selections to processor 104,for example. An input device may also be a display, such as an LCDdisplay, configured with touch screen input capabilities. Another typeof user input device is cursor control 116, such as a mouse, a trackballor cursor direction keys for communicating direction information andcommand selections to processor 104 and for controlling cursor movementon display 112. This input device typically has two degrees of freedomin two axes, a first axis (e.g., x) and a second axis (e.g., y), thatallows the device to specify positions in a plane. A computing system100 provides data processing and provides a level of confidence for suchdata. Consistent with certain implementations of embodiments of thepresent teachings, data processing and confidence values are provided bycomputing system 100 in response to processor 104 executing one or moresequences of one or more instructions contained in memory 106. Suchinstructions may be read into memory 106 from another computer-readablemedium, such as storage device 110. Execution of the sequences ofinstructions contained in memory 106 causes processor 104 to perform theprocess states described herein. Alternatively hard-wired circuitry maybe used in place of or in combination with software instructions toimplement embodiments of the present teachings. Thus implementations ofembodiments of the present teachings are not limited to any specificcombination of hardware circuitry and software.

The term “computer-readable medium” and “computer program product” asused herein generally refers to any media that is involved in providingone or more sequences or one or more instructions to processor 104 forexecution. Such instructions, generally referred to as “computer programcode” (which may be grouped in the form of computer programs or othergroupings), when executed, enable the computing system 100 to performfeatures or functions of embodiments of the present invention. These andother forms of computer-readable media may take many forms, includingbut not limited to, non-volatile media, volatile media, and transmissionmedia. Non-volatile media includes, for example, solid state, optical ormagnetic disks, such as storage device 110. Volatile media includesdynamic memory, such as memory 106. Transmission media includes coaxialcables, copper wire, and fiber optics, including connectivity to bus102.

Common forms of computer-readable media include, for example, a floppydisk, a flexible disk, hard disk, magnetic tape, or any other magneticmedium, a CD-ROM, any other optical medium, punch cards, paper tape, anyother physical medium with patterns of holes, a RAM, PROM, and EPROM, aFLASH-EPROM, any other memory chip or cartridge, a carrier wave asdescribed hereinafter, or any other medium from which a computer canread.

Various forms of computer readable media may be involved in carrying oneor more sequences of one or more instructions to processor 104 forexecution. For example, the instructions may initially be carried onmagnetic disk of a remote computer. The remote computer can load theinstructions into its dynamic memory and send the instructions over atelephone line using a modem. A modem local to computing system 100 canreceive the data on the telephone line and use an infra-red transmitterto convert the data to an infra-red signal. An infra-red detectorcoupled to bus 102 can receive the data carried in the infra-red signaland place the data on bus 102. Bus 102 carries the data to memory 106,from which processor 104 retrieves and executes the instructions. Theinstructions received by memory 106 may optionally be stored on storagedevice 110 either before or after execution by processor 104.

Those skilled in the art will recognize that the operations of thevarious embodiments may be implemented using hardware, software,firmware, or combinations thereof, as appropriate. For example, someprocesses can be carried out using processors or other digital circuitryunder the control of software, firmware, or hard-wired logic. (The term“logic” herein refers to fixed hardware, programmable logic and/or anappropriate combination thereof, as would be recognized by one skilledin the art to carry out the recited functions.) Software and firmwarecan be stored on computer-readable media. Some other processes can beimplemented using analog circuitry, as is well known to one of ordinaryskill in the art. Additionally, memory or other storage, as well ascommunication components, may be employed in embodiments of theinvention.

It will be appreciated that, for clarity, the above description hasdescribed embodiments of the invention with reference to differentfunctional units and processors. However, it will be apparent that anysuitable distribution of functionality between different functionalunits, processors or domains may be used without detracting from theinvention. For example, functionality illustrated to be performed byseparate processors or controllers may be performed by the sameprocessor or controller. Hence, references to specific functional unitsare only to be seen as references to suitable means for providing thedescribed functionality, rather than indicative of a strict logical orphysical structure or organization.

Various embodiments of methods and systems for the analysis of proteinmelt curve data according to the present teachings may utilize variousembodiments of a cycler instrument as depicted in the block diagramshown in FIG. 2.

As previously mentioned, one way in which proteins may be unfolded is byusing thermal unfolding, in which unfolding may proceed as temperatureis increased. Various embodiments of systems and methods for theanalysis of protein melt curves according to the present teachings mayutilize various embodiments of a thermal cycler instrument as depictedin the block diagrams shown in FIG. 2. As shown in FIG. 2, a thermalcycling instrument may include a heated cover 214 that is placed over aplurality of samples 216 contained in a sample support device. Invarious embodiments, a sample support device may be a glass, metal orplastic slide or substrate with a plurality of sample regions, whichsample regions have a cover between the sample regions and heated cover214. Some examples of a sample support device may include, but are notlimited by, a multi-well plate, such as a standard microtiter 96-well, a384-well plate, a micro device capable of processing thousands ofsamples per analysis or a microcard, or a substantially planar support,such as various microfluidic devices, microcard devices, and micro chipdevices fabricated from, for example, but not limited by, a glass, metalor plastic slide or substrate. The sample regions in various embodimentsof a sample support device may include depressions, indentations, holes,ridges, and combinations thereof, patterned in regular or irregulararrays formed on the surface of the slide or substrate. Variousembodiments of a thermal cycler instrument may include a sample block218, elements for heating and cooling 220, and a heat exchanger 222.

Various embodiments of a thermal cycler instrument can process multiplesamples simultaneously, and may be used in the generation andacquisition of protein melt curve data. In FIG. 2, various embodimentsof a thermal cycling system 200 provide a detection system for the runtime acquisition of signals for each sample in a plurality of biologicalsamples, over the entirety the temperature range performed forgenerating protein melt curve data. A detection system may have anillumination source that emits electromagnetic energy, and a detector orimager 210, for receiving electromagnetic energy from samples 216 insample support device. Accordingly, though a thermal cycler instrumentmay be a useful platform for the generation and acquisition of proteinmelt curve data, one of ordinary skill in the art would recognize thatan instrument having detection and sample thermostatting capabilitiesmay be useful for generating protein melt curve data.

A control system 224 may be used to control the functions of thedetection, heated cover, and thermal block assembly. The control systemmay be accessible to an end user through user interface 226 of thermalcycler instrument 200. A computer system 100, as depicted in FIG. 1 mayserve as to provide the control the function of a thermal cyclerinstrument, as well as the user interface function. Additionally,computer system 100 may provide data processing, display and reportpreparation functions. All such instrument control functions may bededicated locally to the thermal cycler instrument, or computer system100 may provide remote control of part or all of the control, analysis,and reporting functions.

As previously described, a large volume of protein melt curve data maybe generated as detector signal data collected over the entirety of adefined temperature range for a protein melt assay for each of a largenumber of samples analyzed during the same run. Given the large volumeof data coupled with the complexity of protein melt curve data, variousembodiments of systems and methods of the present teachings provide forembodiments of computer readable media that may generate processed datafrom initial protein melt curve data collected as detector signal outputas a function of temperature for each sample in a sample support device.

Additionally, various embodiments of systems and methods of the presentteachings provide for embodiments of computer readable media that mayallow an end user the flexibility to dynamically analyze large datasets, and selected subsets thereof, using an interactive user interface.Such an interactive user interface may assist an end user in selectionof, for example, but not limited by, a new set of analysis parameters,another method by which the data may be analyzed, the review of data forselected replicate sets of data, as well as the associated statisticsfor the replicate sets, and the review of which sets of data sets mayfall within a selected threshold in comparison to a target set ofsamples.

FIG. 3 depicts an input/output diagram meant to convey a process bywhich various embodiments of systems and methods for the analysis ofprotein melt curve data may provide an end user the ability todynamically analyze large data sets of protein melt curve data. Asdepicted in FIG. 3, primary inputs may include, for example, but notlimited by, plate set-up information, as well as the detector outputsignals collected for each sample over the entire run. Plate setupinformation includes identifying sample names, and conditions beingtested such as buffer, types of ligands or test compounds, type ofprotein sample, etc. In various embodiments, plate setup information maybe later used to identify replicate wells and present final results foreach tested condition including replicate statistics. According tovarious embodiments of systems and methods of the present teachings,plate set-up information may be entered as primary input by an end userbefore the analysis and then may be imported into the analysis engine invarious embodiments of an automated mode of generating results. Suchinformation provides values for conditions, such as, but not limited bysample type, sample concentration, buffer type, as well as numerousother assay conditions. In various embodiments, plate set-up informationcan be edited manually post-run as secondary input by an end user usingmanual assignment of values for assay conditions. For variousembodiments of systems and methods of the present teachings, an analysisgroup may be defined by an end user either as primary input before a runand as secondary input during post-run analysis. In various embodiments,an end user may define sample data from an entire sample support device,such as a microtiter plate, as an analysis group. For variousembodiments, an analysis group may comprise sample data from a pluralityof sample support devices. In various embodiments, an analysis group maybe defined by an end user as sample data from selected sample regions,such as wells from a microtiter plate, selected from one or a pluralityof sample support devices. In various embodiments, sample data fromsample regions selected from a single sample support device may bedivided into a plurality of analysis groups. An analysis group may becomprised of data for one sample assayed under the same or differentconditions, or may be comprised of a plurality of samples assayed underthe same or different conditions, and any combination thereof.Accordingly, various embodiments of systems and methods of the presentteachings provide the end user with the dynamic flexibility to define,for example, but not limited by, plate setup information, analysisgroups, analysis settings, and threshold settings,

Various embodiments of computer readable media, depicted as the analysisengine in FIG. 3, can take primary or secondary input and generateprocessed melt curve data, for example, but not limited by, melt curveplots of detector signal response versus temperature, nth orderderivative plots of the melt curve plots, a determination of a T_(m),flags for alerting an end user over various aspects of the data andanalysis thereof, and replicate group statistics for groups of samplesidentified by the end user as replicates of a sample, in which variousreplicate groups may comprise an analysis group. Analysis settings ineither an automated or manual mode utilize primary input, such asdetector output and plate set-up information, which may be used togenerate the well level results for each sample as indicated in FIG. 3.Plate set-up information input by the end user may also be used tocompute replicate level results and statistics. In various embodimentsof systems and methods for protein melt curve analysis, a user interfacemay display the results of the processed data from the primary inputs.For various embodiments of systems and methods of the present teachings,once having reviewed the display of the processed data from primaryinputs, through a user interface, an end user may change parametersimpacting data processing by the selection of secondary inputs.According to various embodiments of systems and methods for protein meltcurve analysis of the present teachings, a secondary input is any userinput occurring subsequent to the primary input. In that regard, forvarious embodiments of systems and methods of the present teachings, thenumber of ways that an end user may iteratively select parameters foranalyzing and displaying data is unconstrained. Additionally, an enduser may concurrently analyze data from any primary data stored onvarious types of computer readable media. In that regard, an end usermay concurrently analyze data from different instruments, from differentruns, from different experimental conditions, or any combinations underwhich an end user may desire to select and analyze protein melt curvedata. Such parameters may include, for example, but not limited by,analysis settings, analysis thresholds, analysis mode, or selection of amethod for how a T_(m) may be determined, methods for comparing a T_(m)of a sample or replicate group to another sample or replicate group,replicate group display as a function of user-selected experimentalvariables, and replicate group display as a function of a user-definedthreshold. Plate set up related information may particularly impact allresults requiring more than one well to generate such as ΔTm, replicatelevel flags and statistical analysis.

In FIG. 4-FIG. 6, various embodiments of methods for analyzing initialprotein melt curve data are shown. In step 10 of FIG. 3 for method 300,FIG. 4 of method 310, and FIG. 5 of method 320, a data set of initialprotein melt curve data is received by a processor for a plurality ofsamples. As previously described, the initial protein melt curve datacomprises a detector signal as a function of temperature for each samplein the plurality of samples

In reference to FIG. 4-FIG. 6, step 20 for methods 300 and 310;respectively, preprocessing each of a plurality of sample protein meltcurves may be done to denoise the data collected from detection. As oneof ordinary skill in the art of signal processing is apprised, denoisingdata may include process steps such as, but not limited by, cleaning,normalization, transformation, feature extraction, and featureselection. For various embodiments, a first global smoothing step may bedone, in which the higher frequency noise components may be removed. Invarious embodiments, a Fourier transform smoothing may be applied.According to various embodiments, a second local smoothing may be done.In various embodiments, a local regression smoothing may be done, inwhich a sample melt curve is smoothed sequentially over a definedwindow. For various embodiments, a window may be selected based onfactors such as the number of data points and the system noise.According to various embodiments, a local smoothing function such as,but not limited by, a quadratic regression, a linear regression, and aSavitzky-Golay smoothing function may be applied. In variousembodiments, a robust quadratic or linear smoothing function may beused.

With reference to step 30 of FIG. 4 of method 300 and step 40 of FIG. 5of method 310, for various systems and methods for the analysis ofprotein melt curve data, after a step of preprocessing data, a step ofselecting a region of analysis may be done.

According to various embodiments of systems and methods for the analysisof protein melt curve data, a Boltzmann fit may be applied to a sampleprotein melt curve, after a step of identifying the region of analysis.According to various embodiments, an equation describing a Boltzmann fitmay be given by:

$\begin{matrix}{F_{{Boltzmann}{(T)}} = {F_{T_{initial}} + \frac{( {F_{T_{final}} - F_{T_{initial}}} )}{1 + {e^{\hat{}}\lbrack {{Tm} - {T/C}} \rbrack}}}} & ( {{Eq}.\mspace{14mu} 1} )\end{matrix}$

where:

F_(T) _(initial) =signal amplitude for an initial temperature over whichthe data is fit

F_(T) _(final) =signal amplitude for an final temperature over which thedata is fit

T=a temperature for any data point between T_(initial) and T_(final)

T_(m)=the protein melting temperature for the curve; to be solved for inthe fit

C=a constant

As can be seen by inspection of Eq. 1, the Boltzmann fitting functionhas a term for a signal amplitude at an initial temperature and a signalamplitude at a final temperature. According to various embodiments, asindicated in step 30 of FIG. 4 for method 300 and step 40 of FIG. 5 formethod 310, a region of analysis defining an initial and a finaltemperature range for fitting the data may be identified. According tovarious embodiments as indicated in step 30 of FIG. 5 for method 310 andFIG. 6 for method 320, at least one nth order derivative of the data maybe done on a sample protein melt curve for selecting a region ofanalysis. In various embodiments, a first derivative of the data may betaken. For various embodiments, a first derivative of the smoothed datamay be taken, and the derivative signal may be further smoothed toremove high frequency components. Accordingly, for various embodiments,regions of monotonic rise of signal may be identified from regions ofpositive signal value on this smoothed derivative profile. In variousembodiments, the longest and steepest segment of signal rise may beselected as the region of analysis. In various embodiments, a firstderivative of the data may be taken. For various embodiments, a firstderivative of the smoothed data may be taken. According to variousembodiments, higher order derivatives may be taken of the initial orsmoothed data. In various embodiments, the derivative signal may undergovarious scaling (such as inversion) to improve the mathematical and/ordata presentation properties of the signal to identify the region ofanalysis accurately.

As depicted in step 40 of FIG. 4 for method 300 and 50 of FIG. 5 formethod 310, a best-fit melting temperature T_(m) for a sample curve maybe found. According to various embodiments, and in reference to equation1, a best-fit T_(m) may be derived from the Boltzmann fitting function,when a best fit has been determined. According to various embodiments,the constant, C, is solved for in the fitting process against sampleprotein melt curve data through a self-consistent process, in which theconstant is defined in an iterative process of fitting the data toequation 1. For various embodiments, a best-fit may be converged upon inthe fitting process when a mean-squared error term converges on athreshold value. According to various embodiments, an algorithm such asthe Levenberg-Marquardt algorithm may be used to search variousparameters of a model for which a minimum error between data, such asprotein melt curve data, and a nonlinear least squares fit to such datawill be reached.

For various embodiments of method 300, as shown in FIG. 4, method 310 ofFIG. 5, and method 320 of FIG. 6, a Boltzmann equation, as shown in Eq.1, may provide a fit to a variety of protein melt curve data, havingvariability over a plurality of analytical attributes. For variousembodiments, analytical attributes may include, for example, but notlimited by curve shape, background signal, change in signal amplitude,and noise.

According to various embodiments of method 310 of FIGS. 5 and 320 ofFIG. 6, in addition to an nth order derivative providing the basis foridentifying a region of analysis, a T_(m) value may be determined byusing an nth order derivative. According to various systems and methodsof the present teachings, an end user may use a T_(m) value determinedby using an nth order derivative to compare to a T_(m) value determinedby Boltzmann fit. In various embodiments, an end user may select eitherthe Boltzmann determined T_(m) value or the T_(m) value determined by annth order derivative.

For various embodiments of method 320 of FIG. 6, a moving threshold maybe used on nth order derivative data to identify peaks. For variousembodiments of method 320 of FIG. 6, steps 10-30 may be performed asdescribed previously for the corresponding steps of method 300 of FIG. 4and method 310 if FIG. 5. As one of ordinary skill in the art isapprised, proteins may undergo multiphasic melting. Accordingly, forsuch proteins, there may be a plurality of T_(m) values that may bedetermined for a melt curve of a protein undergoing multiphasic melting.An end user may analyze a multiphasic melt curve using a Boltzmann fitto various selected regions of analysis, according to variousembodiments of method 300 of FIG. 4 and method 310 of FIG. 5.Additionally, an end user may analyze a multiphasic melt curve accordingto various embodiments of method 320 of FIG. 6.

For various embodiments of method 320 of FIG. 6, at step 40, a region ofanalysis may be selected within signal limits R₁ and R₂ of an nthderivative plot of a multiphasic melt curve. In various embodiments,limits R₁ and R₂ of an nth derivative plot of a multiphasic melt curvemay be between about 20% to about 99% of the signal value. For variousembodiments, limits R₁ and R₂ of an nth derivative plot of a multiphasicmelt curve may be between about 10% to about 99% of the signal value.For various embodiments, the lower limit may be selected so that it isclearly at or above an analytical signal distinguished from backgroundnoise. In various embodiments, an end user may select the limits R₁ andR₂ of an nth derivative plot of a multiphasic melt curve.

According to various embodiments of method 320 of FIG. 6, at step 50, athreshold value may be sequentially moved in a stepwise fashion withinthe limits R₁ and R₂ of an nth derivative plot of a multiphasic meltcurve. According to various embodiments, the number of threshold valuestaken in a stepwise fashion may be between about 50 threshold values toabout 1000 threshold values. In various embodiments, the number ofthreshold values taken in a stepwise fashion may be between about 200threshold values to about 600 threshold values. For various embodiments,the number of threshold values taken in a stepwise fashion may beselected by an end user. For example, there may be features in an nthderivative plot of a multiphasic melt curve, such as shoulder peaks, andnoise, which vary from assay to assay, and instrument to instrument. Foran nth derivative plot of a multiphasic melt curve having small shoulderfeatures, a greater number of steps may be necessary in order to analyzesuch features. In contrast, for noisy data, too many steps may result inanalyzing artifacts. Additionally, increasing the number of the numberof threshold values taken in a stepwise fashion between signal limits R₁and R₂ increases the analysis time.

According to various embodiments of method 320 of FIG. 6, at step 50, apeak may be identified as a contiguous region that falls above athreshold at any one threshold value at any one step. In variousembodiments, for a contiguous region in which more than one peak may bevisually apparent in data inspected by an end user, the peak of greatestmagnitude may be counted as the peak at that step. For example, in FIG.7, a first derivative graph of a multiphasic melt for a protein isdepicted. In FIG. 7, each of lines I-VI represents a threshold valuethat was selected in four different steps. At each of a sequentialthreshold I-II, a peak may be defined as a contiguous region above thethreshold. For example, in FIG. 7, at threshold I of step 1, 2 peaks (P₁and P₂) would be determined, while at threshold II of step 2, 3 peaks(P₁, P₂, and P₃) would be determined. However at threshold III of step3, a contiguous region including P₂ and P₃ occurs, so that only P₂, thepeak of greatest magnitude, is counted. Then, for threshold III of step3, 2 peaks (P₁ and P₃) are counted. Finally, in threshold VI of step 6,three peaks (P₁, P₂, and P₄) are counted.

For various embodiments of method 320 of FIG. 6, a threshold may besequentially moved in a stepwise fashion between limits R₁ and R₂ of annth derivative plot of a multiphasic melt curve, and the frequency ofcounted peaks determined. According to various embodiments, anormalization value may be obtained based on the highest frequency ofcounts for a peak, as:

Norm=max(N ₁ ,N ₂ , . . . N _(n))

In this expression, N₁, N₂, . . . N_(n) represent the number of timespeaks (P₁, P₂ . . . P_(n)) are counted in a stepwise count for asequentially moving threshold value between limits R₁ and R₂ of an nthderivative plot of a multiphasic melt curve. Additionally, for variousembodiments of method 120 of FIG. 3, a peak detection frequency valuemay be determined for each peak as:

Γ(n)=N _(1/)max(N ₁ ,N ₂ , . . . N _(n)),N _(2/)max(N ₁ ,N ₂ , . . . N_(n)), . . . N _(n/)max(N ₁ ,N ₂ , . . . N _(n))

In this expression, the peak detection frequency for each peak isdetermined as a quotient of the number of times a peak is counted in astepwise count for a sequentially moving threshold divided by thenormalization value.

According to various embodiments of method 320 of FIG. 6, a rejectionlimit, may be set on Γ(n), so that any peak having a value less than aselected limit is not counted:

X %=N _(r)/max(N ₁ ,N ₂ , . . . N _(n))

In various embodiments the rejection limit, X %, may be between about0.5% to about 6%. In various embodiments, an end user may select arejection limit. An example of various embodiments of method 120 of FIG.3 is depicted in FIG. 8. In FIG. 8, 19 sequential stepwise thresholdvalues were taken, and the number of peaks was determined at each of the19 threshold values. In this example, P₁ is counted 3 times with a Γ(1)of 16%, P₂, is counted 1 time with a Γ(1) of 5% and P₃ is counted 19times for a Γ(1) of 100%. For a rejection limit set at 2%, all peaks inthis example would be selected as peaks having a T_(m) value determinedby the first derivative peak value. For a rejection limit set at 6%, P₂would be rejected, and 2 peaks, P₁ and P₃ would be selected as peakshaving a T_(m) value determined by the first derivative peak value.

In various embodiments of systems and methods for the analysis ofprotein melt curve data according to the present teachings, and withrespect to step 50 of FIG. 4 of method 300, and step 6 of; FIG. 5 andFIG. 6 for methods 310 and 320, respectively, the analysis engine, asdepicted in FIG. 3, may generate processed data. As previouslydiscussed, the analysis engine may generate processed data from bothprimary and secondary inputs, in which a secondary input is any userinput occurring subsequent to the primary input. In that regard, forvarious embodiments of systems and methods of the present teachings, anend user may iteratively select parameters for analyzing and displayingdata. Such parameters may include, for example, but not limited by,analysis settings, analysis thresholds, methods for how a T_(m) may bedetermined, methods for comparing a T_(m) of a sample or replicate groupto another sample or replicate group, replicate group display as afunction of user-selected experimental variables, and replicate groupdisplay as a function of a user-defined threshold, and plate set-upinformation entered by an end user as a secondary input. In thisfashion, an end user may interactively and iteratively analyze data andsubsets of data generated from both within and between run analyses forpotentially large sets of protein melt curve data. In that regard, anend user may concurrently analyze data from any primary data stored onvarious types of computer readable media. Accordingly, an end user mayconcurrently analyze data from different instruments, from differentruns, from different experimental conditions, or any combinations underwhich an end user may desire to select and analyze protein melt curvedata. Results thus generated may be displayed graphically andadditionally presented in tabular format. According to variousembodiments, as will be discussed in more detail subsequently, suchgraphical and tabular displays may be synchronized dynamically on thesame display. Accordingly, selecting row entries on in a tabular formatwill highlight corresponding plots on the graphical display area. Invarious embodiments, an end user r may independently zoom in on anygraphic for detailed review of information. For various embodiments,information displayed in a tabular format may be sorted by, for example,using a mouse or key stroke to select any of a plurality of attributesentered as column header names, thereby providing for the information inthe tabular format to be sorted by a selected attribute.

For example, as can be seen by inspection of Eq. 1, the Boltzmannfitting function has a term for a signal amplitude at an initialtemperature and a signal amplitude at a final temperature. According tovarious embodiments, as shown in FIG. 9, an interactive GUI 400 mayprovide a display of a region of analysis defining an initial 410 and afinal 420 temperature range for a Boltzmann fit to the data 430 incomparison to an nth order derivative of the data 435. In variousembodiments, a first derivative of the data may be taken, as shown inFIG. 9. For various embodiments of an interactive GUI displaying aBoltzmann fit to the data 430 in comparison to a first derivative of thedata 435, the data may be dynamically synchronized to a data tablelisting of the samples 450. For various embodiments, a sample line 452may be highlighted in the table by an end user, which allows thecorresponding data 440, 445 to be visually apparent in the Boltzmann fitdata set 430, and first derivative set of data 435, respectively. Invarious embodiments, an end user may select any line or combination oflines for selective viewing of the corresponding graphs. Additionallyshown on sample table 450 is an example of a flag icon, 454, which mayalert an end user to a number of factors impacting data quality and dataanalysis, as will be discussed in more detail subsequently. The dynamicsynchronization of data table listing of samples 450 with data plots430, 435, may facilitate an end user to visually evaluate selectedgroups of data, and may allow for rapid iteration of such evaluation.For example, but not limited by, such dynamic synchronization may allowan end user to evaluate whether or not a Boltzmann fit is an appropriatefit for selected groups of data. Moreover, flags alerting an end user tofactors impacting data quality and data analysis may facilitate end-userreview of critical issues impacting the overall quality of analysis.

While in FIG. 9 a graph dynamically selected by an end user ishighlighted in a Boltzmann fit graph 440 and a first derivative graph445, allowing comparison to the full data sets, in various embodimentsof an interactive GUI according to the present teachings, as shown inFIG. 10A and FIG. 10B, an interactive GUI 500, may allow an end user maymake a selection from a popup window 580 (FIG. 10A) that allows forviewing only the selected Boltzmann fit data 540 and the correspondingfirst derivative data 545 (FIG. 10B). In addition to well level datareview within the context of all the data analyzed, an end user may wantto assess the data within the context of the various experimentalconditions. To this end, an assay condition studied may be input by anend user as associated with a color.

These color associations may be retained by the analysis engine and canbe invoked in the graphical display as shown in FIG. 11 where the curvesare colored coded by the attribute value of a specific condition. Thedata used in the display shown in FIG. 11 were generate as a time-coursestudy in which dye concentrations were varied, so the variablesdisplayed use of color coding to indicate time and dye concentration.According to various embodiments, a different condition category may beselected by an end user from, for example, a drop down menu. “Accordingto various embodiments, an end user may use color to further enhancedata analysis while maintaining the synchronized interactivity betweenthe tabular and graphical formats. As previously discussed with respectto FIG. 3, if an end user edits plate set-up details during data review;such secondary input will be taken into consideration and the use ofcolor for displaying attributes will be re-displayed with the new colorcoding for the well level attribute values.

As depicted in FIG. 11, the region of analysis is bounded by a firsttemperature selection 410 and a second temperature selection 420, whichmay be automatically determined or manually selected by an end user. InFIG. 11, the two sets of curves, 430 and 432 are taken from a same timepoint in the study, but indicate differences in dye concentrations. Asdepicted in FIG. 11, Curves 435 and 437 are first derivative plots ofcurves 430 and 432 respectively. Sample table 450 indicates informationfor sample data selected from a single sample support device in aplurality of sample support devices that were used to define an analysisgroup for the data presented in part in FIG. 11.

As previously mentioned, analysis groups may be selected in a variety ofways by an end user. Recalling, an end user may define sample data froman entire sample support device, such as a microtiter plate, as ananalysis group. For various embodiments, an analysis group may comprisesample data from a plurality of sample support devices. In variousembodiments, an analysis group may be defined by an end user as sampledata from selected sample regions, such as wells from a microtiterplate, selected from one or a plurality of sample support devices. Invarious embodiments, sample data from sample regions selected from asingle sample support device may be divided into a plurality of analysisgroups. An analysis group may be comprised of data for one sampleassayed under the same or different conditions, or may be comprised of aplurality of samples assayed under the same or different conditions, andany combination thereof. Further, an analysis group may be defined by anend user either as primary input before a run and as secondary inputduring post-run analysis. Though for the purpose of illustration, twosets of sample data are displayed from this study in FIG. 11; window 470indicates that the sample data included for display is taken from ananalysis group defined by 352 sample regions, in this example, samplewells, taken from a plurality of sample support devices, in thisexample, microtiter plates. In various embodiments, sample data definingan analysis group may be selected from between about 1 to about 100sample support devices. As such, various embodiments of systems andmethods according to the present teachings provide an end user with thecapability of interactively displaying and dynamically analyzing a largeand complex amount of data.

As previously discussed, protein melt data may be affected by a varietyof analytical attributes, such as, but not limited by curve shape,background signal, change in signal amplitude, and noise. Additionally,proteins as a class of biopolymers may have complex melt curves, giventhe complexity of primary and secondary structure on tertiary andquaternary folding motifs. In that regard, providing an end userflexibility to evaluate complex protein melt curve data for a pluralityof samples in a sequential and rapid manner through an interactive GUImay facilitate the data analysis process.

According to various embodiments of an interactive GUI according to thepresent teachings, interactive selections by an end user may be made,which enable the rapid and sequential evaluation of a data for aplurality of samples in a protein melt curve experiment. For example asshown in GUI 600 of FIG. 12, for curve 640, the evaluation of a T_(m)642 determined for a Boltzmann fit of curve 640 versus a T_(m) 644determined by an nth derivative 645, may provide an end user with a toolfor evaluation of whether or not a Boltzmann fit is appropriate for thedata being evaluated. As can be seen for this example, the T_(m) 642determined for a Boltzmann fit 640 versus a T_(m) 644 determined by afirst order derivative 645 is fairly close. However, as will bediscussed subsequently, given the complexity of protein melt data, thecomparison may provide an end user with tools for deciding how a T_(m)may be determined.

As previously mentioned, the determination of a T_(m) for protein meltcurve data may be done from a Boltzmann fitted data, after a step ofidentifying the region of analysis. For various embodiments of aninteractive GUI 700 of FIG. 13 according to the present teachings, anend user may readily and iteratively change the region of analysis for aBoltzmann graph and synchronously for an nth order derivative graph,such as first derivative graph. This can be done by, for example, butnot limited by a drag and draw interactive tool. Such an interactivetool would allow an end user to select a new analysis region of analysisby moving initial first bound 710 to a new first bound 711.Additionally, an end user to select a new analysis region of analysis bymoving initial second bound 720 to a new second bound 721. For variousembodiments of systems and methods according to the present teachings,as shown in FIG. 3, an analysis engine may then generate and displaydata according to the new input from an end user. An iterative selectionof a region of analysis may give an end user a rapid, visual means forunderstanding the impact of the selection of a region of analysis on thedetermination of a T_(m) for protein melt curve data.

Though the comparison of a T_(m) determined using a Boltzmann fit may beevaluated by inspecting it in relationship to the determination of aT_(m) determined using an nth order derivative, such as a firstderivative, various embodiments of an interactive GUI according to thepresent teachings may also provide additional tools for such anevaluation. According to various embodiments of an interactive GUI 800of FIG. 14, a Boltzmann fit 810 may be visually displayed coincident toa data curve 840, and synchronously to an nth order derivative curve,such as first derivative curve 845. As previously discussed for FIG. 12,a T_(m) 842 determined from the Boltzmann fitted data 810 may bedirectly compared to a T_(m) 844 determined from an nth order derivativecurve 845.

The complex nature of protein structure lends itself to multi-phase meltcurves, for which there may be a T_(m) determined for each phasetransition. Such a set of multi-phase data is depicted in FIG. 14 andFIG. 15. For such multi-phase melt curve data a Boltzmann fit may not bean appropriate fit model. In various embodiments of an interactive GUI900 of FIG. 15, an end user may select a subset of data 952 from asample table 950, which subset of data 952 for a multi-phase set ofsample melt curves 930 become visually apparent 940, and synchronouslyapparent 945 for a nth order derivative set of curves 935. In thisfashion, an end user may sequentially select any of a set of data fromsample table 952, and view the data in the melt curve set 930, as wellas in the nth order derivative set of curves 935. Such an interactionGUI display may allow an end user to readily determine a T_(m) for eachof a phase transition of a protein displaying a multiphasic meltprofile.

Alternatively, as shown in an interactive GUI 1000 of FIG. 16, an enduser can specifically chose to display just the subset of data selectedin the sample table. According to various embodiments of an interactiveGUI according to the present teachings, an end user may evaluate aBoltzmann fit 1025 to a melt curve 1040 over a defined analysis regionhaving a first bound 1010 and a second bound 1020, and having a T_(m)1027 determined for that selected fit. For various embodiments of GUI1000, a derivative curve 1045 may be synchronously displayed. As can beseen by inspection of FIG. 16, such a T_(m) determination may besignificantly different for a Boltzmann fit selected as the mode ormethod of analysis than for each phase separately determined.

As previously mentioned, various embodiments of systems and methods ofthe present teachings may use the color in a graphical display toprovide an end user with a tool for visually identifying, for example,but not limited by, various experimental conditions. As such, discretevalues of an attribute or condition studied may be color coded by inputfrom an end user in plate set-up information. As shown in FIG. 16, colormay also be utilized to distinguish between fitted curve 1025 versusmutliphasic curve 1040. Accordingly, various embodiments of systems andmethods of the present teachings may utilize various format selectionsfor graphic display of sample data, such as color or line type, toprovide an end user with ease of readily visually differentiatingvarious graphic entries of sample data according to a plurality ofvariables.

Additionally, for either GUI 900 of FIG. 15 or GUI 1000 of FIG. 16, aspreviously discussed for FIG. 13, an end user may utilize an interactivemeans for selecting an analysis region for each of the phases in amulti-phase melt data.

As shown, for example, but not limited by, in FIG. 9 in comparison toFIG. 15, it appears that the baselines for the plurality of melt curves430 of FIG. 9 are substantially aligned, while it appears that thebaselines for the plurality of melt curves 930 in FIG. 15 appear to beshifted over a range of ordinate values. According to variousembodiments of the present teachings, an end user may ready align thecurves. As depicted in interactive GUI 1100 and 1110 of FIGS. 17A and17B, respectively, an end user may select an alignment function from apopup box (not shown) for aligning curves 1140A and 1140B, as shown inFIG. 17A, resulting in the alignment as depicted in FIG. 17B. In variousembodiments, once the alignment function is selected by an end user, amean ordinate offset is calculated, and the curves in a selected set ofcurves are then adjusted to the mean value, so that they are aligned asshown in FIG. 17B. Such a baseline alignment feature may, for example,allow an end user to visually compare similarities and differencesbetween curves in a selected set of curves. For example, but not limitedby, such a baseline alignment feature may allow an end user to evaluatean x-offset, or may allow for the evaluation of the difference in curveshapes in a selected data set.

Various embodiments of an interactive GUI according to the presentteachings may facilitate an understanding of the impact of experimentalvariables on a plurality of samples in a protein melt data set. Suchvariables may include, for example, but not limited by, neutral salttype and concentration, chaotropic agent type and concentration, buffertype; pH and concentration, protein sample, and analysis group. Forvarious embodiments of systems and methods for the analysis of proteinmelt curve data, an analysis engine, as depicted in FIG. 3, maydetermine various replicate group statistics for a selected analysisgroup. Such replicate group statistics may include, for example, but notlimited by, expressions of central tendency, such as mean, and medianvalues, as well as expression of variance, such as standard deviation,and CV %. As depicted in FIG. 3, such replicate group statistics onedetermined from primary or secondary input may then be displayed on aninteractive user interface.

For example, various embodiments of an interactive interface accordingto the present teachings may be displayed as depicted in frames I-III ofFIG. 18. In the top two frames of FIG. 17, plot (I) is a visualdepiction of replicate group statistics for a set of reference replicatesamples. For replicated group statistics generated by an analysis engineon a user-selected set of replicate data, replicate group statistics maybe visually represented.

According to various embodiments of an interactive GUI according to thepresent teachings, as depicted in FIG. 18, replicate group statisticsmay be visually represented in part by a diamond plot, as shown in plot(I). For this diamond plot, the mean of T_(m) values is indicated by afirst vertical line intersecting a first set of apices, while a secondset of apices intersected by the horizontal line represent 95%confidence intervals of the mean. The median of T_(m) values generatedby the analysis engine on the user selected set of replicate data isdepicted as a second, and distinct vertical line. For this example, thereplicate group statistics for plot (I) was generated by user-selecteddata for a set of reference samples run on two separate sample supportdevices for separate analyses run using various embodiments of computerand instrument systems as previously described. The replicate T_(m)values for each sample from each plate may be readily visuallyidentified as a first set depicted by white circles and a second setdepicted by black circles. Such as visual display of central tendencyand variance for T_(m) values may readily allow an end user to evaluate,for example, data quality. For example, the data point represented by an“X” is a data point that was selected by an end user to be omitted,based on an analysis flag, as will be discussed in more detailsubsequently. Additionally, visual comparison of replicate groupstatistics T_(m) values may allow for a ready understanding ofexperimental results. In that regard, as the representation shown inplot II of FIG. 18 is for an experimental set of samples selected forcomparison to the reference set of samples in plot I. As shown, at aglance, an end user may readily compare the replicate group statisticsof the reference and experimental set. For example, but not limited by,an end user may readily discern at a glance the average T_(m) for theexperimental set of samples has been shifted to a higher temperaturethan that of the average T_(m) for reference set of samples.Additionally, the shape of the diamond plot is visually about the same,indicating the variance of the two data sets is approximately the same.According to various embodiments of systems and methods of the presentteachings, a variety of geometric and line shapes, colors, and formatsmay be used to visually impart to an end user replicate group proteinmelt curve data.

In FIG. 18, plot III shows a staggered set of replicate group statisticsfor a data set having a triphasic melt. As will be discussed in moredetail subsequently, for such multiphasic protein melt transitions, anaverage T_(m) may be determined for each transition, and the associatedvisually variance reflected in a user interface.

As previous discussed in the example of flag 454 of FIG. 9, variousembodiments of systems and methods for the analysis of protein meltcurve data provide for alerting an end user, for example, but notlimited by, about initial data attributes, such as quality of signal,detection of multiple phases, as well as analysis issues, such asproblems with curve fitting and replicate group comparisons. In FIG. 19,a table showing some types of issues for which an end user may bealtered is shown. As can be seen in FIG. 19, for each type of issue thatmay be identified for various embodiments of systems and methods forprotein melt curve analysis, an icon may be selected. The association ofan icon with a flag may then act as a ready visual alert of the entirenumber of samples run, or any subset thereof. Subsets of these flags arewell-level (sample-level) flags that are not altered by plate set-upedits, which may be input by an end user as secondary input, accordingto FIG. 3. The remaining replicate group flags are altered by plate setup edits, which may alter the designation of samples comprising areplicate group.

For various embodiments of an interactive GUI 1200 of FIG. 20, areplicate group table 1260 may be provided to an end user in addition toa sample well table 1250, as previously discussed. Replicate group table1260 may include a selection of replicate samples belonging to anidentified group 1262 of a sample protein 1264, as well as variousexperimental variables (1266, 1268). According to various embodiments ofsystems and methods of the present teachings, replicate group data(1230-1236) may be viewed, for example, with respect to T_(m) values fora selected set of experimental variables 1270-1276, which in GUI 1200are displayed as buffer, salt, analysis group, and protein,respectively. The replicates may be represented by diamond plots1230-1236, which as was previously discussed, visually convey importantinformation about replicate group central tendency and variance.Non-overlapping diamond plots visually display the impact thatexperimental variables may have on, for example as shown in FIG. 13, theT_(m) values for a selected set of replicates. As shown in interactiveGUI 1200 of FIG. 9, experimental variables are shown both in thereplicate group table 1260, as well as next to the display of thereplicate group data (1230-1236)

In various embodiments of an interactive GUI according to the presentteachings, a popup box may be used to change the order of the variables,and hence the hierarchy. This may allow an end user the ability tosequentially and rapidly change experimental variables and evaluate theimpact of the variables on replicates selected from a set of proteinmelt samples. For example, in FIG. 21 and FIG. 22, such a change isdemonstrated. For FIG. 21, various embodiments of an interactive GUI1300 may have a conditional hierarchy tree popup box 1380, whichdisplays a selection of variables. An end user may select anyexperimental variable and then move a selected variable 1395 to a newposition as shown in conditional hierarchy tree popup box 1390. In FIG.22, that the buffer has been shifted to another position, so that a newset of diamond plots 1430-1436 may be viewed is clear in comparison todiamond plots 1230-1236 of FIG. 20. The ordering of variables 1470-1476of FIG. 22 has now been reset versus the ordering of variables 1270-1276of FIG. 20. In FIG. 23, still another selection of order has been donefor 1570-1576 of GUI 1500. It is clear that an end user may readily pickout the shift of diamond plots 1530-1532 as a function of analysis groupin comparison to, for example salt concentration, as shown in FIG. 22.

FIG. 24A-FIG. 24C depict various embodiments of an interactive GUI 1600displaying processed data for a replicate set of reference andexperimental data for a protein having a biphasic melt profile. In FIG.24A, various embodiments of GUI 1600 may display a first set of proteinmelt curves 1630 and the corresponding first set of derivative curves1635. Additionally, various embodiments may provide an end user theselection of an additional set or sets of protein melt curves, such asprotein melt curves set 1630′ and corresponding set of derivative curves1635′, which may be viewed simultaneously with a first set. In thisfashion an end user may dynamically compare sets of data taken, forexample, using different experimental conditions, or taken from adifferent run. For various embodiments of systems and methods foranalyzing protein melt curve data, a first curve set may appear visuallymore distinctly than a selected second set, allowing a focus oninteractive analysis of a first set, while maintaining a visualreference to a second set. In FIG. 24A, for example, a first referenceset of protein melt curves 1630 and corresponding derivative curves 1635is more pronounced, than an experimental set of protein melt curves1630′ and corresponding derivative curves 1635′. In comparison in FIG.24B for GUI 1610, the set of experimental protein melt curves 1630′ andcorresponding derivative curves 1635′ have been selected for analysis byan end user for viewing, and appear more pronounced than the referenceset of protein melt curves 1630 and corresponding derivative curves1635. Additionally, an end user is able to select regions of analysisfor the reference set of protein melt curves, as shown in FIG. 24A andFIG. 24B. In FIG. 24A, for example, for the reference set of proteinmelt curves, a first phase of the protein melt is bounded by a regionselected by 1610-1620, while for the second phase of the protein melt,the region bounded by 1611-1621 has been selected.

In FIG. 24C, for various embodiments of an interactive GUI 1700,replicate group statistics are depicted for a protein displaying abiphasic melt, as shown in FIGS. 24A and 24B. In the top view firstdiamond plot 1730 and second diamond plot 1732 depict the replicategroup statistics for the first and second melt phases of a selectedcurves 1630 as shown in FIG. 24A. In the bottom view first diamond plot1740 and second diamond plot 1742 depict the replicate group statisticsfor the first and second melt phases of a selected curves 1630′ as shownin FIG. 24B. For example, a ligand may actually shift peaks 1740, 1742in comparison to a native protein melt 1730-1732. Such a comparison maybecome readily observable to an end user by using various embodiments ofreplicate group visualization, according to various embodiments ofsystems and methods of the present teachings.

In addition to ready inspection of the impact of experimental variableson a selected group of replicates from a plurality of samples, thediamond plots may be evaluated with respect to an end user definedthreshold value. For various embodiments GUI 1800 of FIG. 25A a popupbox 1880 may be selected by an end user. According to variousembodiments of systems and methods of the present teachings, a positivehits setting box 1882, may allow an end user to select a negativethreshold value for ΔT_(m) 1884 defined by a method of curve-fitting ofthe data, as well as to select a negative threshold value for ΔT_(m)1886 defined by an nth order derivative. For example, in GUI 1900 ofFIG. 25A, an end user would like to particularly highlight any replicategroup have a T_(m) less than 3 units lower than that of the reference.In GUI 1900 of FIG. 25B, a threshold so selected may allow an end userto see the position of the diamond plots 1940-1948 of groups below theselected threshold of a selected reference. In GUI 1900 of FIG. 25B, areference diamond plot 1930 is shown. While diamond plots forexperimental sets 1940-1948 are highlighted as falling below theselected threshold, the diamond plot 1932 clearly falls outside theselection range. In that regard, diamond plots 1940-1948 are visuallyhighlighted as replicate groups of interest to an end user.

Further, as shown in FIGS. 26A and 26B, an end user may also select apositive threshold value. For various embodiments GUI 2000 of FIG. 26A,a popup box 2080 may be selected by an end user. According to variousembodiments of systems and methods of the present teachings, a positivehits setting box 2082, may allow an end user to select a positivethreshold value for ΔT_(m) 2084 defined by a method of curve-fitting ofthe data, as well as to select a positive threshold value for ΔT_(m)2086 defined by an nth order derivative. For example, in GUI 2000 ofFIG. 26A, an end user would like to particularly highlight any replicategroup have a T_(m) greater than 2 units higher than that of thereference. In GUI 1900 of FIG. 25B, a threshold so selected may allow anend user to see the position of the diamond plots 1940-1948 of groupsbelow the selected threshold of a selected reference. In GUI 2100 ofFIG. 26B, a reference diamond plot 2130 is shown. While diamond plotsfor experimental sets 2140-2147 are highlighted as falling above theselected threshold, the diamond plots 2132-2136 clearly fall outside theselection range. In that regard, diamond plots 2140-2147 are visuallyhighlighted as replicate groups of interest to an end user.

According to various embodiments of an interactive GUI of the presentteachings as depicted in FIGS. 25A, 25B, 26A and 26B, such aninteractive GUI may readily allow an end user to evaluate whether or nota set of experimental values had a desired impact. For example, if anexperiment was designed to see whether or not a set of variables mightincrease or decrease the T_(m) values for a selected set of replicates,then a threshold for an expected value may be set, and the diamond plotsevaluated against the expected threshold.

Finally, regarding FIG. 4, step 60 as well as FIG. 5 and FIG. 6, step70, as previously discussed, and as one of ordinary skill in the art mayreadily recognize, there are various ways of outputting protein meltcurve information; for example, but not limited by melt curve plots,T_(m) values, and ΔT_(m) values, to an end user in numerous formatsusing numerous devices. For example, with respect to format of proteinmelt curve information, the information may be presented in a graphicalformat, as a written report, or combinations thereof. With respect tooutput devices, protein melt curve information may be output to devicessuch as, but not limited by a printer, a cathode ray tube (CRT) display,a liquid crystal display (LCD), and a light-emitting diode (LED)display.

While the principles of various embodiments of systems and methods forthe analysis of protein melt curve data have been described inconnection with specific embodiments, it should be understood clearlythat these descriptions are made only by way of example and are notintended to limit the scope of the invention. What has been disclosedherein has been provided for the purposes of illustration anddescription. It is not intended to be exhaustive or to limit what isdisclosed to the precise forms described. Many modifications andvariations will be apparent to the practitioner skilled in the art. Whatis disclosed was chosen and described in order to best explain theprinciples and practical application of the disclosed embodiments of theart described, thereby enabling others skilled in the art to understandthe various embodiments and various modifications that are suited to theparticular use contemplated. It is intended that the scope of what isdisclosed be defined by the following claims and their equivalence.

1. A system comprising: a processor; and a memory in communication withthe processor; the memory storing instructions for: receiving by theprocessor a set protein melt curve data; generating and displaying afirst data set of processed melt curve data from the initial melt curvedata; and presenting an end user with an interface for the interactiveanalysis of the first data set of processed protein melt curve data,wherein the interactive analysis comprises generating a display of asecond data set of processed protein melt curve data, wherein the secondset of processed protein melt curve data is generated in response to theuser input.
 2. The system of claim 1, wherein the protein melt curvedata comprises a detector response value as a function of temperaturefor an end-user selected analysis group.
 3. The system of claim 1,wherein the display of the first data set and second data set is adisplay of a Boltzmann fit of the data.
 4. The system of claim 1,wherein the display of the first data and second data set is a displayof an nth derivative plot of the data.
 5. The system of claim 1, whereinthe display of the first data and second data set is a diamond plotdisplaying replicate group central tendency and variance.
 6. Acomputer-readable medium encoded with instructions, executable by aprocessor, for analyzing protein melt curve data, the instructionscomprising instructions for: receiving a set of protein melt curve datafor a plurality of samples; generating and displaying a first data setof processed melt curve data from the initial melt curve data; andpresenting an end user with an interface for the interactive analysis ofthe first data set of processed protein melt curve data, wherein theinteractive analysis comprises generating a display of a second data setof processed protein melt curve data, wherein the second set ofprocessed protein melt curve data is generated in response to the userinput.
 7. The computer-readable medium of claim 6, wherein the proteinmelt curve data comprises a detector response value as a function oftemperature for an end-user selected analysis group.
 8. Thecomputer-readable medium of claim 6, wherein the display of the firstdata set and second data set is a display of a Boltzmann fit of thedata.
 9. The computer-readable medium of claim 6, wherein the display ofthe first data and second data set is a display of an nth derivativeplot of the data.
 10. The computer-readable medium of claim 6, whereinthe display of the first data and second data set is a diamond plotdisplaying replicate group central tendency and variance.
 11. A computerimplemented method for determining a genotype for a genomic locus in abiological sample, the method comprising: receiving by a processor a setof protein melt curve data; processing the set of protein melt curvedata on a computer, the processing comprising: generating and displayinga first data set of processed melt curve data from the initial meltcurve data; and presenting an end user with an interface for theinteractive analysis of the first data set of processed protein meltcurve data, wherein the interactive analysis comprises generating adisplay of a second data set of processed protein melt curve data,wherein the second set of processed protein melt curve data is generatedin response to the user input.
 12. The computer implemented method ofclaim 11, wherein the protein melt curve data comprises a detectorresponse value as a function of temperature for an end-user selectedanalysis group.
 13. The computer implemented method of claim 11, whereinthe display of the first data set and second data set is a display of aBoltzmann fit of the data.
 14. The computer implemented method of claim11, wherein the display of the first data and second data set is adisplay of an nth derivative plot of the data.
 15. The computerimplemented method of claim 11, wherein the display of the first dataand second data set is a diamond plot displaying replicate group centraltendency and variance.